Overview
Brought to you by YData
Dataset statistics
| Number of variables | 20 |
|---|---|
| Number of observations | 50000 |
| Missing cells | 50168 |
| Missing cells (%) | 5.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 33.7 MiB |
| Average record size in memory | 707.2 B |
Variable types
| Text | 4 |
|---|---|
| Categorical | 5 |
| Numeric | 11 |
Aromaticity is highly overall correlated with Oxidized_coefficient and 1 other fields | High correlation |
Function_Prediction_source is highly overall correlated with Protein_source | High correlation |
Function_prediction_source is highly overall correlated with Phage_source and 1 other fields | High correlation |
Molecular_weight is highly overall correlated with Oxidized_coefficient and 1 other fields | High correlation |
Oxidized_coefficient is highly overall correlated with Aromaticity and 2 other fields | High correlation |
Phage_source is highly overall correlated with Function_prediction_source and 1 other fields | High correlation |
Protein_source is highly overall correlated with Function_Prediction_source and 2 other fields | High correlation |
Reduced_coefficient is highly overall correlated with Aromaticity and 2 other fields | High correlation |
Start is highly overall correlated with Stop | High correlation |
Stop is highly overall correlated with Start | High correlation |
Protein_source is highly imbalanced (94.2%) | Imbalance |
Function_prediction_source has 22870 (45.7%) missing values | Missing |
Function_Prediction_source has 27130 (54.3%) missing values | Missing |
Protein_ID has unique values | Unique |
Aromaticity has 8261 (16.5%) zeros | Zeros |
Instability_index has 836 (1.7%) zeros | Zeros |
Helix_fraction has 2231 (4.5%) zeros | Zeros |
Turn_fraction has 2985 (6.0%) zeros | Zeros |
Sheet_fraction has 2367 (4.7%) zeros | Zeros |
Reduced_coefficient has 13696 (27.4%) zeros | Zeros |
Oxidized_coefficient has 13220 (26.4%) zeros | Zeros |
Reproduction
| Analysis started | 2025-07-29 12:15:02.726461 |
|---|---|
| Analysis finished | 2025-07-29 12:15:20.462748 |
| Duration | 17.74 seconds |
| Software version | ydata-profiling v0.0.dev0 |
| Download configuration | config.json |
Variables
Phage_ID
Text
| Distinct | 47845 |
|---|---|
| Distinct (%) | 95.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.4 MiB |
Length
| Max length | 87 |
|---|---|
| Median length | 85 |
| Mean length | 34.65984 |
| Min length | 5 |
Unique
| Unique | 45794 ? |
|---|---|
| Unique (%) | 91.6% |
Sample
| 1st row | NC_011019.1 |
|---|---|
| 2nd row | NC_008723.1 |
| 3rd row | NC_017968.1 |
| 4th row | NC_009820.1 |
| 5th row | NC_018863.1 |
| Value | Count | Frequency (%) |
| imgvr_uvig_3300008299_000009|3300008299|ga0114868_1000024 | 4 | < 0.1% |
| station168_dcm_all_assembly_node_569_length_94681_cov_11.157230 | 4 | < 0.1% |
| imgvr_uvig_3300045988_102218|3300045988|ga0495776_017886 | 4 | < 0.1% |
| mgv-genome-0378315 | 4 | < 0.1% |
| mycobacterium_phage_porcelain | 4 | < 0.1% |
| nc_030936.1 | 4 | < 0.1% |
| imgvr_uvig_3300029604_000307|3300029604|ga0245147_100033|79620-241832 | 4 | < 0.1% |
| mgv-genome-0380120 | 4 | < 0.1% |
| uvig_555935 | 3 | < 0.1% |
| station180_zzz_all_assembly_node_179_length_172350_cov_80.433280 | 3 | < 0.1% |
| Other values (47835) | 49962 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 189725 | 10.9% |
| _ | 138641 | 8.0% |
| 3 | 107075 | 6.2% |
| 1 | 90650 | 5.2% |
| 2 | 84127 | 4.9% |
| 8 | 82152 | 4.7% |
| 5 | 80230 | 4.6% |
| 4 | 78857 | 4.6% |
| 9 | 73716 | 4.3% |
| 7 | 70518 | 4.1% |
| Other values (57) | 737301 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1732992 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 189725 | 10.9% |
| _ | 138641 | 8.0% |
| 3 | 107075 | 6.2% |
| 1 | 90650 | 5.2% |
| 2 | 84127 | 4.9% |
| 8 | 82152 | 4.7% |
| 5 | 80230 | 4.6% |
| 4 | 78857 | 4.6% |
| 9 | 73716 | 4.3% |
| 7 | 70518 | 4.1% |
| Other values (57) | 737301 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1732992 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 189725 | 10.9% |
| _ | 138641 | 8.0% |
| 3 | 107075 | 6.2% |
| 1 | 90650 | 5.2% |
| 2 | 84127 | 4.9% |
| 8 | 82152 | 4.7% |
| 5 | 80230 | 4.6% |
| 4 | 78857 | 4.6% |
| 9 | 73716 | 4.3% |
| 7 | 70518 | 4.1% |
| Other values (57) | 737301 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1732992 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 189725 | 10.9% |
| _ | 138641 | 8.0% |
| 3 | 107075 | 6.2% |
| 1 | 90650 | 5.2% |
| 2 | 84127 | 4.9% |
| 8 | 82152 | 4.7% |
| 5 | 80230 | 4.6% |
| 4 | 78857 | 4.6% |
| 9 | 73716 | 4.3% |
| 7 | 70518 | 4.1% |
| Other values (57) | 737301 |
Protein_source
Categorical
High correlation  Imbalance 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.1 MiB |
| prodigal | |
|---|---|
| RefSeq | 521 |
| Genbank | 242 |
| DDBJ | 20 |
| EMBL | 7 |
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 7.97216 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | RefSeq |
|---|---|
| 2nd row | RefSeq |
| 3rd row | RefSeq |
| 4th row | RefSeq |
| 5th row | RefSeq |
Common Values
| Value | Count | Frequency (%) |
| prodigal | 49210 | |
| RefSeq | 521 | 1.0% |
| Genbank | 242 | 0.5% |
| DDBJ | 20 | < 0.1% |
| EMBL | 7 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| prodigal | 49210 | |
| refseq | 521 | 1.0% |
| genbank | 242 | 0.5% |
| ddbj | 20 | < 0.1% |
| embl | 7 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 49452 | |
| r | 49210 | |
| p | 49210 | |
| o | 49210 | |
| d | 49210 | |
| i | 49210 | |
| g | 49210 | |
| l | 49210 | |
| e | 1284 | 0.3% |
| R | 521 | 0.1% |
| Other values (13) | 2881 | 0.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 398608 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 49452 | |
| r | 49210 | |
| p | 49210 | |
| o | 49210 | |
| d | 49210 | |
| i | 49210 | |
| g | 49210 | |
| l | 49210 | |
| e | 1284 | 0.3% |
| R | 521 | 0.1% |
| Other values (13) | 2881 | 0.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 398608 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 49452 | |
| r | 49210 | |
| p | 49210 | |
| o | 49210 | |
| d | 49210 | |
| i | 49210 | |
| g | 49210 | |
| l | 49210 | |
| e | 1284 | 0.3% |
| R | 521 | 0.1% |
| Other values (13) | 2881 | 0.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 398608 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 49452 | |
| r | 49210 | |
| p | 49210 | |
| o | 49210 | |
| d | 49210 | |
| i | 49210 | |
| g | 49210 | |
| l | 49210 | |
| e | 1284 | 0.3% |
| R | 521 | 0.1% |
| Other values (13) | 2881 | 0.7% |
Function_prediction_source
Categorical
High correlation  Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 22870 |
| Missing (%) | 45.7% |
| Memory size | 3.0 MiB |
| eggNOG-mapper | |
|---|---|
| Iterative search | |
| - | |
| RefSeq | 521 |
| Genbank | 242 |
| Other values (2) | 27 |
Length
| Max length | 16 |
|---|---|
| Median length | 13 |
| Mean length | 11.486989 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | RefSeq |
|---|---|
| 2nd row | RefSeq |
| 3rd row | RefSeq |
| 4th row | RefSeq |
| 5th row | RefSeq |
Common Values
| Value | Count | Frequency (%) |
| eggNOG-mapper | 10992 | |
| Iterative search | 9898 | |
| - | 5450 | 10.9% |
| RefSeq | 521 | 1.0% |
| Genbank | 242 | 0.5% |
| DDBJ | 20 | < 0.1% |
| EMBL | 7 | < 0.1% |
| (Missing) | 22870 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| eggnog-mapper | 10992 | |
| iterative | 9898 | |
| search | 9898 | |
| 5450 | ||
| refseq | 521 | 1.4% |
| genbank | 242 | 0.7% |
| ddbj | 20 | 0.1% |
| embl | 7 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 52962 | |
| a | 31030 | 10.0% |
| r | 30788 | 9.9% |
| g | 21984 | 7.1% |
| p | 21984 | 7.1% |
| t | 19796 | 6.4% |
| - | 16442 | 5.3% |
| G | 11234 | 3.6% |
| m | 10992 | 3.5% |
| N | 10992 | 3.5% |
| Other values (21) | 83438 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 311642 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 52962 | |
| a | 31030 | 10.0% |
| r | 30788 | 9.9% |
| g | 21984 | 7.1% |
| p | 21984 | 7.1% |
| t | 19796 | 6.4% |
| - | 16442 | 5.3% |
| G | 11234 | 3.6% |
| m | 10992 | 3.5% |
| N | 10992 | 3.5% |
| Other values (21) | 83438 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 311642 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 52962 | |
| a | 31030 | 10.0% |
| r | 30788 | 9.9% |
| g | 21984 | 7.1% |
| p | 21984 | 7.1% |
| t | 19796 | 6.4% |
| - | 16442 | 5.3% |
| G | 11234 | 3.6% |
| m | 10992 | 3.5% |
| N | 10992 | 3.5% |
| Other values (21) | 83438 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 311642 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 52962 | |
| a | 31030 | 10.0% |
| r | 30788 | 9.9% |
| g | 21984 | 7.1% |
| p | 21984 | 7.1% |
| t | 19796 | 6.4% |
| - | 16442 | 5.3% |
| G | 11234 | 3.6% |
| m | 10992 | 3.5% |
| N | 10992 | 3.5% |
| Other values (21) | 83438 |
Start
Real number (ℝ)
High correlation 
| Distinct | 34302 |
|---|---|
| Distinct (%) | 68.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29069.426 |
| Minimum | 1 |
|---|---|
| Maximum | 448958 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1312.95 |
| Q1 | 8944.75 |
| median | 20886 |
| Q3 | 37679.25 |
| 95-th percentile | 87545.15 |
| Maximum | 448958 |
| Range | 448957 |
| Interquartile range (IQR) | 28734.5 |
Descriptive statistics
| Standard deviation | 31133.898 |
|---|---|
| Coefficient of variation (CV) | 1.0710187 |
| Kurtosis | 14.952845 |
| Mean | 29069.426 |
| Median Absolute Deviation (MAD) | 13465.5 |
| Skewness | 2.9461042 |
| Sum | 1.4534713 × 109 |
| Variance | 9.6931959 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 216 | 0.4% |
| 3 | 179 | 0.4% |
| 2 | 178 | 0.4% |
| 50 | 30 | 0.1% |
| 61 | 12 | < 0.1% |
| 90 | 9 | < 0.1% |
| 4015 | 8 | < 0.1% |
| 1416 | 7 | < 0.1% |
| 2109 | 7 | < 0.1% |
| 22448 | 7 | < 0.1% |
| Other values (34292) | 49347 |
| Value | Count | Frequency (%) |
| 1 | 216 | |
| 2 | 178 | |
| 3 | 179 | |
| 6 | 2 | < 0.1% |
| 8 | 1 | < 0.1% |
| 10 | 1 | < 0.1% |
| 11 | 1 | < 0.1% |
| 12 | 1 | < 0.1% |
| 13 | 2 | < 0.1% |
| 14 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 448958 | 1 | |
| 414661 | 1 | |
| 397002 | 1 | |
| 380976 | 1 | |
| 367073 | 1 | |
| 365881 | 1 | |
| 357531 | 1 | |
| 345250 | 1 | |
| 343402 | 1 | |
| 341337 | 1 |
Stop
Real number (ℝ)
High correlation 
| Distinct | 34692 |
|---|---|
| Distinct (%) | 69.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29759.046 |
| Minimum | 65 |
|---|---|
| Maximum | 449674 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 65 |
|---|---|
| 5-th percentile | 1974.95 |
| Q1 | 9667.25 |
| median | 21581.5 |
| Q3 | 38343.25 |
| 95-th percentile | 88182.65 |
| Maximum | 449674 |
| Range | 449609 |
| Interquartile range (IQR) | 28676 |
Descriptive statistics
| Standard deviation | 31131.486 |
|---|---|
| Coefficient of variation (CV) | 1.0461184 |
| Kurtosis | 14.975301 |
| Mean | 29759.046 |
| Median Absolute Deviation (MAD) | 13463 |
| Skewness | 2.9470997 |
| Sum | 1.4879523 × 109 |
| Variance | 9.6916945 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1224 | 11 | < 0.1% |
| 12072 | 8 | < 0.1% |
| 3015 | 7 | < 0.1% |
| 5764 | 7 | < 0.1% |
| 11033 | 7 | < 0.1% |
| 1537 | 7 | < 0.1% |
| 2589 | 7 | < 0.1% |
| 9848 | 6 | < 0.1% |
| 10704 | 6 | < 0.1% |
| 14580 | 6 | < 0.1% |
| Other values (34682) | 49928 |
| Value | Count | Frequency (%) |
| 65 | 1 | < 0.1% |
| 66 | 1 | < 0.1% |
| 69 | 1 | < 0.1% |
| 71 | 2 | |
| 72 | 3 | |
| 73 | 1 | < 0.1% |
| 75 | 1 | < 0.1% |
| 78 | 1 | < 0.1% |
| 79 | 1 | < 0.1% |
| 81 | 2 |
| Value | Count | Frequency (%) |
| 449674 | 1 | |
| 415482 | 1 | |
| 398222 | 1 | |
| 382007 | 1 | |
| 368425 | 1 | |
| 366084 | 1 | |
| 358202 | 1 | |
| 346206 | 1 | |
| 346113 | 1 | |
| 342080 | 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | - |
|---|---|
| 2nd row | + |
| 3rd row | - |
| 4th row | + |
| 5th row | - |
Common Values
| Value | Count | Frequency (%) |
| - | 25109 | |
| + | 24891 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 50000 |
Most occurring characters
| Value | Count | Frequency (%) |
| - | 25109 | |
| + | 24891 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 50000 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| - | 25109 | |
| + | 24891 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 50000 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| - | 25109 | |
| + | 24891 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 50000 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| - | 25109 | |
| + | 24891 |
Protein_ID
Text
Unique 
| Distinct | 50000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.5 MiB |
Length
| Max length | 91 |
|---|---|
| Median length | 87 |
| Mean length | 37.52778 |
| Min length | 7 |
Unique
| Unique | 50000 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | YP_001994544.1 |
|---|---|
| 2nd row | YP_950675.1 |
| 3rd row | YP_006382285.1 |
| 4th row | YP_001469324.1 |
| 5th row | YP_006908410.1 |
| Value | Count | Frequency (%) |
| np_958633.1 | 1 | < 0.1% |
| biochar_1064_29 | 1 | < 0.1% |
| yp_001994544.1 | 1 | < 0.1% |
| yp_950675.1 | 1 | < 0.1% |
| yp_006382285.1 | 1 | < 0.1% |
| yp_001469324.1 | 1 | < 0.1% |
| yp_006908410.1 | 1 | < 0.1% |
| yp_008052003.1 | 1 | < 0.1% |
| yp_007010876.1 | 1 | < 0.1% |
| yp_003969626.1 | 1 | < 0.1% |
| Other values (49990) | 49990 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 195477 | 10.4% |
| _ | 187851 | 10.0% |
| 3 | 118669 | 6.3% |
| 1 | 108197 | 5.8% |
| 2 | 97637 | 5.2% |
| 4 | 89134 | 4.8% |
| 5 | 89042 | 4.7% |
| 8 | 88366 | 4.7% |
| 9 | 79630 | 4.2% |
| 7 | 77236 | 4.1% |
| Other values (57) | 745150 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1876389 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 195477 | 10.4% |
| _ | 187851 | 10.0% |
| 3 | 118669 | 6.3% |
| 1 | 108197 | 5.8% |
| 2 | 97637 | 5.2% |
| 4 | 89134 | 4.8% |
| 5 | 89042 | 4.7% |
| 8 | 88366 | 4.7% |
| 9 | 79630 | 4.2% |
| 7 | 77236 | 4.1% |
| Other values (57) | 745150 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1876389 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 195477 | 10.4% |
| _ | 187851 | 10.0% |
| 3 | 118669 | 6.3% |
| 1 | 108197 | 5.8% |
| 2 | 97637 | 5.2% |
| 4 | 89134 | 4.8% |
| 5 | 89042 | 4.7% |
| 8 | 88366 | 4.7% |
| 9 | 79630 | 4.2% |
| 7 | 77236 | 4.1% |
| Other values (57) | 745150 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1876389 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 195477 | 10.4% |
| _ | 187851 | 10.0% |
| 3 | 118669 | 6.3% |
| 1 | 108197 | 5.8% |
| 2 | 97637 | 5.2% |
| 4 | 89134 | 4.8% |
| 5 | 89042 | 4.7% |
| 8 | 88366 | 4.7% |
| 9 | 79630 | 4.2% |
| 7 | 77236 | 4.1% |
| Other values (57) | 745150 |
Product
Text
| Distinct | 4022 |
|---|---|
| Distinct (%) | 8.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 MiB |
Length
| Max length | 902 |
|---|---|
| Median length | 761 |
| Mean length | 25.97064 |
| Min length | 3 |
Unique
| Unique | 1782 ? |
|---|---|
| Unique (%) | 3.6% |
Sample
| 1st row | hypothetical protein |
|---|---|
| 2nd row | major tail protein |
| 3rd row | major capsid protein |
| 4th row | hypothetical protein |
| 5th row | RNA polymerase sigma factor |
| Value | Count | Frequency (%) |
| unknown | 19689 | 11.8% |
| protein | 12700 | 7.6% |
| of | 4730 | 2.8% |
| hypothetical | 4383 | 2.6% |
| the | 3883 | 2.3% |
| domain | 3698 | 2.2% |
| phage | 3190 | 1.9% |
| family | 2911 | 1.7% |
| dna | 2844 | 1.7% |
| to | 1995 | 1.2% |
| Other values (5284) | 107471 |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 128202 | 9.9% |
| 117511 | 9.0% | |
| e | 100221 | 7.7% |
| o | 99374 | 7.7% |
| i | 88001 | 6.8% |
| t | 82210 | 6.3% |
| a | 75139 | 5.8% |
| r | 57074 | 4.4% |
| s | 49455 | 3.8% |
| l | 47447 | 3.7% |
| Other values (70) | 453898 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1298532 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| n | 128202 | 9.9% |
| 117511 | 9.0% | |
| e | 100221 | 7.7% |
| o | 99374 | 7.7% |
| i | 88001 | 6.8% |
| t | 82210 | 6.3% |
| a | 75139 | 5.8% |
| r | 57074 | 4.4% |
| s | 49455 | 3.8% |
| l | 47447 | 3.7% |
| Other values (70) | 453898 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1298532 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| n | 128202 | 9.9% |
| 117511 | 9.0% | |
| e | 100221 | 7.7% |
| o | 99374 | 7.7% |
| i | 88001 | 6.8% |
| t | 82210 | 6.3% |
| a | 75139 | 5.8% |
| r | 57074 | 4.4% |
| s | 49455 | 3.8% |
| l | 47447 | 3.7% |
| Other values (70) | 453898 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1298532 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| n | 128202 | 9.9% |
| 117511 | 9.0% | |
| e | 100221 | 7.7% |
| o | 99374 | 7.7% |
| i | 88001 | 6.8% |
| t | 82210 | 6.3% |
| a | 75139 | 5.8% |
| r | 57074 | 4.4% |
| s | 49455 | 3.8% |
| l | 47447 | 3.7% |
| Other values (70) | 453898 |
| Distinct | 65 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.2 MiB |
Length
| Max length | 45 |
|---|---|
| Median length | 9 |
| Mean length | 10.45058 |
| Min length | 6 |
Unique
| Unique | 5 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | hypothetical; |
|---|---|
| 2nd row | infection; |
| 3rd row | assembly; |
| 4th row | hypothetical; |
| 5th row | replication; |
| Value | Count | Frequency (%) |
| unsorted | 27503 | |
| hypothetical | 4380 | 8.8% |
| assembly | 3676 | 7.4% |
| replication | 2426 | 4.9% |
| infection | 1978 | 4.0% |
| packaging | 1754 | 3.5% |
| lysis | 1446 | 2.9% |
| assembly;infection | 1434 | 2.9% |
| integration | 1162 | 2.3% |
| regulation | 1080 | 2.2% |
| Other values (55) | 3161 | 6.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| ; | 53910 | |
| e | 50139 | |
| t | 49592 | |
| n | 47148 | |
| o | 43303 | 8.3% |
| s | 42419 | 8.1% |
| r | 35398 | 6.8% |
| u | 30714 | 5.9% |
| i | 29793 | 5.7% |
| d | 27676 | 5.3% |
| Other values (15) | 112437 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 522529 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| ; | 53910 | |
| e | 50139 | |
| t | 49592 | |
| n | 47148 | |
| o | 43303 | 8.3% |
| s | 42419 | 8.1% |
| r | 35398 | 6.8% |
| u | 30714 | 5.9% |
| i | 29793 | 5.7% |
| d | 27676 | 5.3% |
| Other values (15) | 112437 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 522529 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| ; | 53910 | |
| e | 50139 | |
| t | 49592 | |
| n | 47148 | |
| o | 43303 | 8.3% |
| s | 42419 | 8.1% |
| r | 35398 | 6.8% |
| u | 30714 | 5.9% |
| i | 29793 | 5.7% |
| d | 27676 | 5.3% |
| Other values (15) | 112437 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 522529 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| ; | 53910 | |
| e | 50139 | |
| t | 49592 | |
| n | 47148 | |
| o | 43303 | 8.3% |
| s | 42419 | 8.1% |
| r | 35398 | 6.8% |
| u | 30714 | 5.9% |
| i | 29793 | 5.7% |
| d | 27676 | 5.3% |
| Other values (15) | 112437 |
Molecular_weight
Real number (ℝ)
High correlation 
| Distinct | 44452 |
|---|---|
| Distinct (%) | 89.1% |
| Missing | 84 |
| Missing (%) | 0.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4149.2137 |
| Minimum | 75.0666 |
|---|---|
| Maximum | 8770.8033 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 75.0666 |
|---|---|
| 5-th percentile | 417.4176 |
| Q1 | 2048.0349 |
| median | 4220.8564 |
| Q3 | 6254.9195 |
| 95-th percentile | 7694.0663 |
| Maximum | 8770.8033 |
| Range | 8695.7367 |
| Interquartile range (IQR) | 4206.8846 |
Descriptive statistics
| Standard deviation | 2375.7274 |
|---|---|
| Coefficient of variation (CV) | 0.57257293 |
| Kurtosis | -1.243662 |
| Mean | 4149.2137 |
| Median Absolute Deviation (MAD) | 2099.4409 |
| Skewness | -0.052646883 |
| Sum | 2.0711215 × 108 |
| Variance | 5644080.9 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 131.1729 | 111 | 0.2% |
| 146.1876 | 105 | 0.2% |
| 147.1293 | 78 | 0.2% |
| 89.0932 | 76 | 0.2% |
| 174.201 | 58 | 0.1% |
| 75.0666 | 56 | 0.1% |
| 105.0926 | 48 | 0.1% |
| 146.1445 | 46 | 0.1% |
| 117.1463 | 45 | 0.1% |
| 245.2755 | 43 | 0.1% |
| Other values (44442) | 49250 | |
| (Missing) | 84 | 0.2% |
| Value | Count | Frequency (%) |
| 75.0666 | 56 | |
| 89.0932 | 76 | |
| 105.0926 | 48 | |
| 115.1305 | 9 | < 0.1% |
| 117.1463 | 45 | |
| 119.1192 | 28 | 0.1% |
| 121.1582 | 5 | < 0.1% |
| 131.1729 | 111 | |
| 132.1179 | 38 | 0.1% |
| 133.1027 | 41 | 0.1% |
| Value | Count | Frequency (%) |
| 8770.8033 | 1 | |
| 8690.9637 | 1 | |
| 8669.5906 | 1 | |
| 8665.8902 | 1 | |
| 8665.857 | 1 | |
| 8662.5385 | 1 | |
| 8639.6324 | 1 | |
| 8638.6479 | 1 | |
| 8632.7731 | 1 | |
| 8627.7922 | 1 |
Aromaticity
Real number (ℝ)
High correlation  Zeros 
| Distinct | 472 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.089780236 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 8261 |
| Zeros (%) | 16.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.041666667 |
| median | 0.083333333 |
| Q3 | 0.125 |
| 95-th percentile | 0.2 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.083333333 |
Descriptive statistics
| Standard deviation | 0.079517352 |
|---|---|
| Coefficient of variation (CV) | 0.88568883 |
| Kurtosis | 32.038142 |
| Mean | 0.089780236 |
| Median Absolute Deviation (MAD) | 0.041666667 |
| Skewness | 3.596842 |
| Sum | 4489.0118 |
| Variance | 0.0063230093 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 8261 | 16.5% |
| 0.1428571429 | 1055 | 2.1% |
| 0.1111111111 | 970 | 1.9% |
| 0.09090909091 | 967 | 1.9% |
| 0.125 | 954 | 1.9% |
| 0.1 | 952 | 1.9% |
| 0.07692307692 | 800 | 1.6% |
| 0.08333333333 | 799 | 1.6% |
| 0.1666666667 | 774 | 1.5% |
| 0.07142857143 | 703 | 1.4% |
| Other values (462) | 33765 |
| Value | Count | Frequency (%) |
| 0 | 8261 | |
| 0.01428571429 | 22 | < 0.1% |
| 0.01449275362 | 25 | 0.1% |
| 0.01470588235 | 19 | < 0.1% |
| 0.01492537313 | 22 | < 0.1% |
| 0.01515151515 | 18 | < 0.1% |
| 0.01538461538 | 34 | 0.1% |
| 0.015625 | 17 | < 0.1% |
| 0.01587301587 | 18 | < 0.1% |
| 0.01612903226 | 29 | 0.1% |
| Value | Count | Frequency (%) |
| 1 | 86 | |
| 0.75 | 2 | < 0.1% |
| 0.6666666667 | 14 | < 0.1% |
| 0.625 | 1 | < 0.1% |
| 0.6 | 8 | < 0.1% |
| 0.5 | 158 | |
| 0.4736842105 | 1 | < 0.1% |
| 0.4545454545 | 1 | < 0.1% |
| 0.4444444444 | 1 | < 0.1% |
| 0.4285714286 | 14 | < 0.1% |
Instability_index
Real number (ℝ)
Zeros 
| Distinct | 39281 |
|---|---|
| Distinct (%) | 78.7% |
| Missing | 84 |
| Missing (%) | 0.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.581297 |
| Minimum | -86.5 |
|---|---|
| Maximum | 388.53333 |
| Zeros | 836 |
| Zeros (%) | 1.7% |
| Negative | 3261 |
| Negative (%) | 6.5% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | -86.5 |
|---|---|
| 5-th percentile | -3.9905 |
| Q1 | 17.655254 |
| median | 33.759336 |
| Q3 | 50.438469 |
| 95-th percentile | 82.593357 |
| Maximum | 388.53333 |
| Range | 475.03333 |
| Interquartile range (IQR) | 32.783216 |
Descriptive statistics
| Standard deviation | 29.032043 |
|---|---|
| Coefficient of variation (CV) | 0.8159355 |
| Kurtosis | 5.7399298 |
| Mean | 35.581297 |
| Median Absolute Deviation (MAD) | 16.372123 |
| Skewness | 1.1322243 |
| Sum | 1776076 |
| Variance | 842.85954 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 836 | 1.7% |
| 5 | 510 | 1.0% |
| 6.666666667 | 321 | 0.6% |
| 7.5 | 181 | 0.4% |
| 8 | 138 | 0.3% |
| -13.725 | 94 | 0.2% |
| 8.333333333 | 84 | 0.2% |
| 55.65 | 82 | 0.2% |
| -37.45 | 79 | 0.2% |
| -21.63333333 | 78 | 0.2% |
| Other values (39271) | 47513 | |
| (Missing) | 84 | 0.2% |
| Value | Count | Frequency (%) |
| -86.5 | 1 | < 0.1% |
| -79.55 | 1 | < 0.1% |
| -72.525 | 4 | < 0.1% |
| -71.73333333 | 4 | < 0.1% |
| -70.15 | 21 | |
| -69.1 | 2 | < 0.1% |
| -68.56666667 | 2 | < 0.1% |
| -67.65 | 1 | < 0.1% |
| -65.3 | 1 | < 0.1% |
| -60.6625 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 388.5333333 | 1 | < 0.1% |
| 306.2666667 | 1 | < 0.1% |
| 299.6 | 1 | < 0.1% |
| 291.4 | 7 | |
| 289.8571429 | 1 | < 0.1% |
| 264.8 | 1 | < 0.1% |
| 261.8 | 6 | |
| 242.6222222 | 1 | < 0.1% |
| 240.85 | 1 | < 0.1% |
| 231.725 | 1 | < 0.1% |
Isoelectric_point
Real number (ℝ)
| Distinct | 19776 |
|---|---|
| Distinct (%) | 39.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.8118843 |
| Minimum | 4.0500284 |
|---|---|
| Maximum | 11.999968 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 4.0500284 |
|---|---|
| 5-th percentile | 4.0500284 |
| Q1 | 4.6111429 |
| median | 6.0682673 |
| Q3 | 9.1348639 |
| 95-th percentile | 10.655415 |
| Maximum | 11.999968 |
| Range | 7.9499393 |
| Interquartile range (IQR) | 4.5237209 |
Descriptive statistics
| Standard deviation | 2.3625355 |
|---|---|
| Coefficient of variation (CV) | 0.34682554 |
| Kurtosis | -1.2634447 |
| Mean | 6.8118843 |
| Median Absolute Deviation (MAD) | 1.9033197 |
| Skewness | 0.41950469 |
| Sum | 340594.22 |
| Variance | 5.5815738 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4.050028419 | 4503 | 9.0% |
| 5.525000191 | 838 | 1.7% |
| 11.99996777 | 551 | 1.1% |
| 8.750052071 | 448 | 0.9% |
| 9.750021172 | 216 | 0.4% |
| 5.57001667 | 181 | 0.4% |
| 11.00083675 | 155 | 0.3% |
| 5.240009499 | 141 | 0.3% |
| 4.370259285 | 141 | 0.3% |
| 5.494989204 | 138 | 0.3% |
| Other values (19766) | 42688 |
| Value | Count | Frequency (%) |
| 4.050028419 | 4503 | |
| 4.051619911 | 1 | < 0.1% |
| 4.052074623 | 1 | < 0.1% |
| 4.05224514 | 1 | < 0.1% |
| 4.052586174 | 1 | < 0.1% |
| 4.052756691 | 1 | < 0.1% |
| 4.052984047 | 1 | < 0.1% |
| 4.053097725 | 2 | < 0.1% |
| 4.053211403 | 1 | < 0.1% |
| 4.053268242 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 11.99996777 | 551 | |
| 11.94213963 | 1 | < 0.1% |
| 11.936273 | 2 | < 0.1% |
| 11.93047085 | 1 | < 0.1% |
| 11.92453976 | 1 | < 0.1% |
| 11.91706142 | 1 | < 0.1% |
| 11.91042118 | 1 | < 0.1% |
| 11.90887394 | 2 | < 0.1% |
| 11.90784245 | 1 | < 0.1% |
| 11.90552158 | 1 | < 0.1% |
Helix_fraction
Real number (ℝ)
Zeros 
| Distinct | 1179 |
|---|---|
| Distinct (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.29488597 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 2231 |
| Zeros (%) | 4.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.074074074 |
| Q1 | 0.23684211 |
| median | 0.2962963 |
| Q3 | 0.35294118 |
| 95-th percentile | 0.49056604 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.11609907 |
Descriptive statistics
| Standard deviation | 0.12564571 |
|---|---|
| Coefficient of variation (CV) | 0.42608236 |
| Kurtosis | 6.2035764 |
| Mean | 0.29488597 |
| Median Absolute Deviation (MAD) | 0.057549858 |
| Skewness | 0.90052166 |
| Sum | 14744.299 |
| Variance | 0.015786845 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.3333333333 | 2510 | 5.0% |
| 0 | 2231 | 4.5% |
| 0.25 | 1581 | 3.2% |
| 0.2857142857 | 1170 | 2.3% |
| 0.5 | 1046 | 2.1% |
| 0.2 | 809 | 1.6% |
| 0.3 | 793 | 1.6% |
| 0.4 | 694 | 1.4% |
| 0.2727272727 | 567 | 1.1% |
| 0.3076923077 | 552 | 1.1% |
| Other values (1169) | 38047 |
| Value | Count | Frequency (%) |
| 0 | 2231 | |
| 0.01449275362 | 1 | < 0.1% |
| 0.015625 | 1 | < 0.1% |
| 0.01639344262 | 1 | < 0.1% |
| 0.01851851852 | 1 | < 0.1% |
| 0.02040816327 | 2 | < 0.1% |
| 0.02173913043 | 1 | < 0.1% |
| 0.02222222222 | 4 | < 0.1% |
| 0.02272727273 | 3 | < 0.1% |
| 0.025 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 314 | |
| 0.9090909091 | 1 | < 0.1% |
| 0.875 | 3 | < 0.1% |
| 0.8571428571 | 1 | < 0.1% |
| 0.8571428571 | 1 | < 0.1% |
| 0.8333333333 | 2 | < 0.1% |
| 0.8 | 10 | < 0.1% |
| 0.75 | 45 | 0.1% |
| 0.7333333333 | 1 | < 0.1% |
| 0.7272727273 | 1 | < 0.1% |
Turn_fraction
Real number (ℝ)
Zeros 
| Distinct | 904 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.2061058 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 2985 |
| Zeros (%) | 6.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.14285714 |
| median | 0.2 |
| Q3 | 0.25531915 |
| 95-th percentile | 0.37931034 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.11246201 |
Descriptive statistics
| Standard deviation | 0.11392829 |
|---|---|
| Coefficient of variation (CV) | 0.55276607 |
| Kurtosis | 9.5106599 |
| Mean | 0.2061058 |
| Median Absolute Deviation (MAD) | 0.057142857 |
| Skewness | 1.7315084 |
| Sum | 10305.29 |
| Variance | 0.012979656 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2985 | 6.0% |
| 0.25 | 1643 | 3.3% |
| 0.2 | 1547 | 3.1% |
| 0.1666666667 | 1323 | 2.6% |
| 0.3333333333 | 1232 | 2.5% |
| 0.1428571429 | 1049 | 2.1% |
| 0.2222222222 | 793 | 1.6% |
| 0.125 | 698 | 1.4% |
| 0.1818181818 | 683 | 1.4% |
| 0.2857142857 | 638 | 1.3% |
| Other values (894) | 37409 |
| Value | Count | Frequency (%) |
| 0 | 2985 | |
| 0.01612903226 | 1 | < 0.1% |
| 0.01639344262 | 1 | < 0.1% |
| 0.02040816327 | 2 | < 0.1% |
| 0.02083333333 | 2 | < 0.1% |
| 0.02173913043 | 1 | < 0.1% |
| 0.02222222222 | 1 | < 0.1% |
| 0.02325581395 | 1 | < 0.1% |
| 0.02380952381 | 1 | < 0.1% |
| 0.025 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 189 | |
| 0.935483871 | 1 | < 0.1% |
| 0.8823529412 | 1 | < 0.1% |
| 0.875 | 1 | < 0.1% |
| 0.8571428571 | 1 | < 0.1% |
| 0.8571428571 | 1 | < 0.1% |
| 0.8571428571 | 2 | < 0.1% |
| 0.8333333333 | 2 | < 0.1% |
| 0.8181818182 | 1 | < 0.1% |
| 0.8 | 10 | < 0.1% |
Sheet_fraction
Real number (ℝ)
Zeros 
| Distinct | 1001 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.2576687 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 2367 |
| Zeros (%) | 4.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.051282051 |
| Q1 | 0.1875 |
| median | 0.25 |
| Q3 | 0.32 |
| 95-th percentile | 0.45454545 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.1325 |
Descriptive statistics
| Standard deviation | 0.12874943 |
|---|---|
| Coefficient of variation (CV) | 0.49967043 |
| Kurtosis | 6.8258445 |
| Mean | 0.2576687 |
| Median Absolute Deviation (MAD) | 0.066326531 |
| Skewness | 1.3772987 |
| Sum | 12883.435 |
| Variance | 0.016576416 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2367 | 4.7% |
| 0.3333333333 | 1857 | 3.7% |
| 0.25 | 1780 | 3.6% |
| 0.2 | 1258 | 2.5% |
| 0.2857142857 | 891 | 1.8% |
| 0.1666666667 | 885 | 1.8% |
| 0.5 | 858 | 1.7% |
| 0.2222222222 | 732 | 1.5% |
| 0.3 | 627 | 1.3% |
| 0.1428571429 | 622 | 1.2% |
| Other values (991) | 38123 |
| Value | Count | Frequency (%) |
| 0 | 2367 | |
| 0.02173913043 | 1 | < 0.1% |
| 0.02325581395 | 1 | < 0.1% |
| 0.0243902439 | 1 | < 0.1% |
| 0.025 | 1 | < 0.1% |
| 0.02857142857 | 2 | < 0.1% |
| 0.02941176471 | 2 | < 0.1% |
| 0.0303030303 | 1 | < 0.1% |
| 0.03125 | 4 | < 0.1% |
| 0.03333333333 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 322 | |
| 0.875 | 1 | < 0.1% |
| 0.8666666667 | 1 | < 0.1% |
| 0.8333333333 | 1 | < 0.1% |
| 0.8333333333 | 7 | < 0.1% |
| 0.8181818182 | 1 | < 0.1% |
| 0.8181818182 | 1 | < 0.1% |
| 0.8 | 17 | < 0.1% |
| 0.7857142857 | 1 | < 0.1% |
| 0.75 | 61 | 0.1% |
Reduced_coefficient
Real number (ℝ)
High correlation  Zeros 
| Distinct | 80 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5004.6842 |
| Minimum | 0 |
|---|---|
| Maximum | 45490 |
| Zeros | 13696 |
| Zeros (%) | 27.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2980 |
| Q3 | 7450 |
| 95-th percentile | 15470 |
| Maximum | 45490 |
| Range | 45490 |
| Interquartile range (IQR) | 7450 |
Descriptive statistics
| Standard deviation | 5526.0936 |
|---|---|
| Coefficient of variation (CV) | 1.1041843 |
| Kurtosis | 2.6617344 |
| Mean | 5004.6842 |
| Median Absolute Deviation (MAD) | 2980 |
| Skewness | 1.4823222 |
| Sum | 2.5023421 × 108 |
| Variance | 30537710 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 13696 | |
| 1490 | 8108 | |
| 2980 | 4944 | 9.9% |
| 6990 | 3007 | 6.0% |
| 5500 | 2959 | 5.9% |
| 4470 | 2774 | 5.5% |
| 8480 | 2635 | 5.3% |
| 9970 | 1726 | 3.5% |
| 5960 | 1472 | 2.9% |
| 12490 | 1056 | 2.1% |
| Other values (70) | 7623 |
| Value | Count | Frequency (%) |
| 0 | 13696 | |
| 1490 | 8108 | |
| 2980 | 4944 | 9.9% |
| 4470 | 2774 | 5.5% |
| 5500 | 2959 | 5.9% |
| 5960 | 1472 | 2.9% |
| 6990 | 3007 | 6.0% |
| 7450 | 660 | 1.3% |
| 8480 | 2635 | 5.3% |
| 8940 | 313 | 0.6% |
| Value | Count | Frequency (%) |
| 45490 | 2 | |
| 44920 | 1 | < 0.1% |
| 44000 | 1 | < 0.1% |
| 41940 | 1 | < 0.1% |
| 40450 | 1 | < 0.1% |
| 39990 | 1 | < 0.1% |
| 38960 | 1 | < 0.1% |
| 38500 | 1 | < 0.1% |
| 38390 | 1 | < 0.1% |
| 37930 | 3 |
Oxidized_coefficient
Real number (ℝ)
High correlation  Zeros 
| Distinct | 227 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5023.2342 |
| Minimum | 0 |
|---|---|
| Maximum | 45490 |
| Zeros | 13220 |
| Zeros (%) | 26.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2980 |
| Q3 | 7575 |
| 95-th percentile | 15720 |
| Maximum | 45490 |
| Range | 45490 |
| Interquartile range (IQR) | 7575 |
Descriptive statistics
| Standard deviation | 5536.0772 |
|---|---|
| Coefficient of variation (CV) | 1.1020942 |
| Kurtosis | 2.6467933 |
| Mean | 5023.2342 |
| Median Absolute Deviation (MAD) | 2980 |
| Skewness | 1.4791047 |
| Sum | 2.5116171 × 108 |
| Variance | 30648150 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 13220 | |
| 1490 | 7426 | |
| 2980 | 4331 | 8.7% |
| 5500 | 2688 | 5.4% |
| 6990 | 2581 | 5.2% |
| 4470 | 2303 | 4.6% |
| 8480 | 2157 | 4.3% |
| 9970 | 1383 | 2.8% |
| 5960 | 1193 | 2.4% |
| 12490 | 897 | 1.8% |
| Other values (217) | 11821 |
| Value | Count | Frequency (%) |
| 0 | 13220 | |
| 125 | 407 | 0.8% |
| 250 | 56 | 0.1% |
| 375 | 11 | < 0.1% |
| 500 | 1 | < 0.1% |
| 625 | 1 | < 0.1% |
| 1490 | 7426 | |
| 1615 | 555 | 1.1% |
| 1740 | 99 | 0.2% |
| 1865 | 25 | 0.1% |
| Value | Count | Frequency (%) |
| 45490 | 2 | |
| 44920 | 1 | < 0.1% |
| 44125 | 1 | < 0.1% |
| 42065 | 1 | < 0.1% |
| 40575 | 1 | < 0.1% |
| 39990 | 1 | < 0.1% |
| 38960 | 1 | < 0.1% |
| 38515 | 1 | < 0.1% |
| 38500 | 1 | < 0.1% |
| 37930 | 3 |
Phage_source
Categorical
High correlation 
| Distinct | 14 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.9 MiB |
| IMG_VR | |
|---|---|
| MGV | |
| GPD | |
| GOV2 | |
| TemPhD | |
| Other values (9) |
Length
| Max length | 8 |
|---|---|
| Median length | 7 |
| Mean length | 4.34912 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | RefSeq |
|---|---|
| 2nd row | RefSeq |
| 3rd row | RefSeq |
| 4th row | RefSeq |
| 5th row | RefSeq |
Common Values
| Value | Count | Frequency (%) |
| IMG_VR | 13985 | |
| MGV | 12188 | |
| GPD | 8837 | |
| GOV2 | 6091 | |
| TemPhD | 4053 | 8.1% |
| CHVD | 2240 | 4.5% |
| GVD | 853 | 1.7% |
| RefSeq | 521 | 1.0% |
| PhagesDB | 409 | 0.8% |
| IGVD | 408 | 0.8% |
| Other values (4) | 415 | 0.8% |
Length
| Value | Count | Frequency (%) |
| img_vr | 13985 | |
| mgv | 12188 | |
| gpd | 8837 | |
| gov2 | 6091 | |
| temphd | 4053 | 8.1% |
| chvd | 2240 | 4.5% |
| gvd | 853 | 1.7% |
| refseq | 521 | 1.0% |
| phagesdb | 409 | 0.8% |
| igvd | 408 | 0.8% |
| Other values (4) | 415 | 0.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| G | 42604 | |
| V | 35911 | |
| M | 26180 | |
| D | 16840 | 7.7% |
| R | 14506 | 6.7% |
| I | 14393 | 6.6% |
| _ | 13985 | 6.4% |
| P | 13299 | 6.1% |
| O | 6091 | 2.8% |
| 2 | 6091 | 2.8% |
| Other values (19) | 27556 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 217456 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| G | 42604 | |
| V | 35911 | |
| M | 26180 | |
| D | 16840 | 7.7% |
| R | 14506 | 6.7% |
| I | 14393 | 6.6% |
| _ | 13985 | 6.4% |
| P | 13299 | 6.1% |
| O | 6091 | 2.8% |
| 2 | 6091 | 2.8% |
| Other values (19) | 27556 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 217456 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| G | 42604 | |
| V | 35911 | |
| M | 26180 | |
| D | 16840 | 7.7% |
| R | 14506 | 6.7% |
| I | 14393 | 6.6% |
| _ | 13985 | 6.4% |
| P | 13299 | 6.1% |
| O | 6091 | 2.8% |
| 2 | 6091 | 2.8% |
| Other values (19) | 27556 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 217456 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| G | 42604 | |
| V | 35911 | |
| M | 26180 | |
| D | 16840 | 7.7% |
| R | 14506 | 6.7% |
| I | 14393 | 6.6% |
| _ | 13985 | 6.4% |
| P | 13299 | 6.1% |
| O | 6091 | 2.8% |
| 2 | 6091 | 2.8% |
| Other values (19) | 27556 |
Function_Prediction_source
Categorical
High correlation  Missing 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 27130 |
| Missing (%) | 54.3% |
| Memory size | 2.8 MiB |
| - | |
|---|---|
| eggNOG-mapper | |
| Iterative search |
Length
| Max length | 16 |
|---|---|
| Median length | 1 |
| Mean length | 6.707477 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | eggNOG-mapper |
|---|---|
| 2nd row | - |
| 3rd row | eggNOG-mapper |
| 4th row | - |
| 5th row | eggNOG-mapper |
Common Values
| Value | Count | Frequency (%) |
| - | 12421 | |
| eggNOG-mapper | 8735 | 17.5% |
| Iterative search | 1714 | 3.4% |
| (Missing) | 27130 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 12421 | ||
| eggnog-mapper | 8735 | |
| iterative | 1714 | 7.0% |
| search | 1714 | 7.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 22612 | |
| - | 21156 | |
| g | 17470 | |
| p | 17470 | |
| a | 12163 | |
| r | 12163 | |
| G | 8735 | 5.7% |
| O | 8735 | 5.7% |
| N | 8735 | 5.7% |
| m | 8735 | 5.7% |
| Other values (8) | 15426 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 153400 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 22612 | |
| - | 21156 | |
| g | 17470 | |
| p | 17470 | |
| a | 12163 | |
| r | 12163 | |
| G | 8735 | 5.7% |
| O | 8735 | 5.7% |
| N | 8735 | 5.7% |
| m | 8735 | 5.7% |
| Other values (8) | 15426 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 153400 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 22612 | |
| - | 21156 | |
| g | 17470 | |
| p | 17470 | |
| a | 12163 | |
| r | 12163 | |
| G | 8735 | 5.7% |
| O | 8735 | 5.7% |
| N | 8735 | 5.7% |
| m | 8735 | 5.7% |
| Other values (8) | 15426 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 153400 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 22612 | |
| - | 21156 | |
| g | 17470 | |
| p | 17470 | |
| a | 12163 | |
| r | 12163 | |
| G | 8735 | 5.7% |
| O | 8735 | 5.7% |
| N | 8735 | 5.7% |
| m | 8735 | 5.7% |
| Other values (8) | 15426 |
Interactions
Correlations
| Aromaticity | Function_Prediction_source | Function_prediction_source | Helix_fraction | Instability_index | Isoelectric_point | Molecular_weight | Oxidized_coefficient | Phage_source | Protein_source | Reduced_coefficient | Sheet_fraction | Start | Stop | Strand | Turn_fraction | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Aromaticity | 1.000 | 0.011 | 0.023 | 0.464 | -0.010 | -0.010 | 0.208 | 0.600 | 0.020 | 0.010 | 0.603 | -0.233 | 0.040 | 0.039 | 0.001 | -0.039 |
| Function_Prediction_source | 0.011 | 1.000 | 0.000 | 0.039 | 0.016 | 0.045 | 0.031 | 0.000 | 0.333 | 1.000 | 0.000 | 0.060 | 0.066 | 0.065 | 0.012 | 0.072 |
| Function_prediction_source | 0.023 | 0.000 | 1.000 | 0.039 | 0.002 | 0.021 | 0.022 | 0.012 | 0.822 | 1.000 | 0.012 | 0.001 | 0.095 | 0.094 | 0.053 | 0.028 |
| Helix_fraction | 0.464 | 0.039 | 0.039 | 1.000 | -0.136 | -0.053 | 0.068 | 0.248 | 0.016 | 0.000 | 0.252 | -0.062 | 0.042 | 0.041 | 0.013 | -0.196 |
| Instability_index | -0.010 | 0.016 | 0.002 | -0.136 | 1.000 | -0.036 | 0.171 | 0.085 | 0.000 | 0.000 | 0.081 | 0.147 | 0.003 | 0.002 | 0.010 | 0.016 |
| Isoelectric_point | -0.010 | 0.045 | 0.021 | -0.053 | -0.036 | 1.000 | 0.050 | 0.025 | 0.022 | 0.000 | 0.026 | -0.287 | -0.006 | -0.006 | 0.006 | 0.002 |
| Molecular_weight | 0.208 | 0.031 | 0.022 | 0.068 | 0.171 | 0.050 | 1.000 | 0.629 | 0.012 | 0.003 | 0.622 | 0.035 | 0.013 | 0.011 | 0.000 | 0.019 |
| Oxidized_coefficient | 0.600 | 0.000 | 0.012 | 0.248 | 0.085 | 0.025 | 0.629 | 1.000 | 0.013 | 0.009 | 0.998 | -0.110 | 0.025 | 0.024 | 0.000 | 0.005 |
| Phage_source | 0.020 | 0.333 | 0.822 | 0.016 | 0.000 | 0.022 | 0.012 | 0.013 | 1.000 | 1.000 | 0.014 | 0.023 | 0.077 | 0.078 | 0.060 | 0.020 |
| Protein_source | 0.010 | 1.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.003 | 0.009 | 1.000 | 1.000 | 0.010 | 0.000 | 0.079 | 0.079 | 0.041 | 0.000 |
| Reduced_coefficient | 0.603 | 0.000 | 0.012 | 0.252 | 0.081 | 0.026 | 0.622 | 0.998 | 0.014 | 0.010 | 1.000 | -0.108 | 0.025 | 0.023 | 0.000 | 0.004 |
| Sheet_fraction | -0.233 | 0.060 | 0.001 | -0.062 | 0.147 | -0.287 | 0.035 | -0.110 | 0.023 | 0.000 | -0.108 | 1.000 | -0.021 | -0.025 | 0.017 | -0.333 |
| Start | 0.040 | 0.066 | 0.095 | 0.042 | 0.003 | -0.006 | 0.013 | 0.025 | 0.077 | 0.079 | 0.025 | -0.021 | 1.000 | 0.999 | 0.005 | -0.010 |
| Stop | 0.039 | 0.065 | 0.094 | 0.041 | 0.002 | -0.006 | 0.011 | 0.024 | 0.078 | 0.079 | 0.023 | -0.025 | 0.999 | 1.000 | 0.000 | -0.005 |
| Strand | 0.001 | 0.012 | 0.053 | 0.013 | 0.010 | 0.006 | 0.000 | 0.000 | 0.060 | 0.041 | 0.000 | 0.017 | 0.005 | 0.000 | 1.000 | 0.008 |
| Turn_fraction | -0.039 | 0.072 | 0.028 | -0.196 | 0.016 | 0.002 | 0.019 | 0.005 | 0.020 | 0.000 | 0.004 | -0.333 | -0.010 | -0.005 | 0.008 | 1.000 |
Missing values
Sample
| Phage_ID | Protein_source | Function_prediction_source | Start | Stop | Strand | Protein_ID | Product | Protein_classification | Molecular_weight | Aromaticity | Instability_index | Isoelectric_point | Helix_fraction | Turn_fraction | Sheet_fraction | Reduced_coefficient | Oxidized_coefficient | Phage_source | Function_Prediction_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NC_011019.1 | RefSeq | RefSeq | 51457 | 51993 | - | YP_001994544.1 | hypothetical protein | hypothetical; | 4439.9769 | 0.078947 | 40.926316 | 5.134573 | 0.236842 | 0.131579 | 0.394737 | 4470 | 4470 | RefSeq | NaN |
| 1 | NC_008723.1 | RefSeq | RefSeq | 8638 | 9177 | + | YP_950675.1 | major tail protein | infection; | 4196.4307 | 0.102564 | 36.107692 | 4.050028 | 0.205128 | 0.333333 | 0.333333 | 13980 | 13980 | RefSeq | NaN |
| 2 | NC_017968.1 | RefSeq | RefSeq | 30673 | 31647 | - | YP_006382285.1 | major capsid protein | assembly; | 4879.5720 | 0.045455 | 48.911364 | 6.037461 | 0.272727 | 0.113636 | 0.340909 | 0 | 0 | RefSeq | NaN |
| 3 | NC_009820.1 | RefSeq | RefSeq | 52261 | 52434 | + | YP_001469324.1 | hypothetical protein | hypothetical; | 6640.5466 | 0.052632 | 69.292982 | 4.906194 | 0.192982 | 0.263158 | 0.350877 | 12490 | 12615 | RefSeq | NaN |
| 4 | NC_018863.1 | RefSeq | RefSeq | 79762 | 80397 | - | YP_006908410.1 | RNA polymerase sigma factor | replication; | 75.0666 | 0.000000 | 0.000000 | 5.525000 | 0.000000 | 1.000000 | 0.000000 | 0 | 0 | RefSeq | NaN |
| 5 | NC_021309.1 | RefSeq | RefSeq | 45734 | 46045 | + | YP_008052003.1 | hypothetical protein | hypothetical; | 3862.3716 | 0.121212 | 18.009091 | 6.745049 | 0.272727 | 0.121212 | 0.151515 | 8480 | 8605 | RefSeq | NaN |
| 6 | NC_019543.1 | RefSeq | RefSeq | 125465 | 126616 | - | YP_007010876.1 | RNA ligase and tail fiber protein attachment catalyst | lysis;replication;infection; | 4002.4203 | 0.151515 | 46.812121 | 4.751137 | 0.333333 | 0.090909 | 0.303030 | 11460 | 11460 | RefSeq | NaN |
| 7 | NC_014636.1 | RefSeq | RefSeq | 213015 | 214970 | + | YP_003969626.1 | baseplate wedge subunit | infection; | 2521.8880 | 0.095238 | 25.361905 | 6.143238 | 0.380952 | 0.142857 | 0.238095 | 1490 | 1490 | RefSeq | NaN |
| 8 | NC_019725.1 | RefSeq | RefSeq | 33766 | 34455 | + | YP_007112724.1 | membrane associated protein | assembly; | 2026.2280 | 0.052632 | 15.836842 | 5.839661 | 0.210526 | 0.263158 | 0.210526 | 1490 | 1490 | RefSeq | NaN |
| 9 | NC_005859.1 | RefSeq | RefSeq | 43505 | 44143 | - | YP_006913.1 | tail fiber protein | infection; | 246.2603 | 0.000000 | 5.000000 | 4.598695 | 0.500000 | 0.000000 | 0.500000 | 0 | 0 | RefSeq | NaN |
| Phage_ID | Protein_source | Function_prediction_source | Start | Stop | Strand | Protein_ID | Product | Protein_classification | Molecular_weight | Aromaticity | Instability_index | Isoelectric_point | Helix_fraction | Turn_fraction | Sheet_fraction | Reduced_coefficient | Oxidized_coefficient | Phage_source | Function_Prediction_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 49990 | biochar_1561 | prodigal | NaN | 25420 | 25884 | + | biochar_1561_32 | unknown | unsorted; | 1405.5551 | 0.000000 | -10.964286 | 5.210851 | 0.285714 | 0.285714 | 0.142857 | 0 | 0 | STV | - |
| 49991 | biochar_1290 | prodigal | NaN | 9261 | 9680 | + | biochar_1290_21 | unknown | unsorted; | 6911.0897 | 0.000000 | 75.334783 | 8.351057 | 0.275362 | 0.376812 | 0.217391 | 0 | 0 | STV | - |
| 49992 | biochar_3952 | prodigal | NaN | 12725 | 13378 | + | biochar_3952_16 | pectinesterase activity | unsorted; | 661.7448 | 0.000000 | -15.685714 | 5.570017 | 0.285714 | 0.285714 | 0.142857 | 0 | 0 | STV | Iterative search |
| 49993 | biochar_3049 | prodigal | NaN | 15377 | 15877 | + | biochar_3049_24 | unknown | unsorted; | 3524.0537 | 0.259259 | 26.418519 | 8.112137 | 0.444444 | 0.037037 | 0.148148 | 9970 | 9970 | STV | - |
| 49994 | biochar_4543 | prodigal | NaN | 10187 | 10534 | + | biochar_4543_12 | unknown | unsorted; | 5123.7085 | 0.111111 | 62.642222 | 5.304181 | 0.311111 | 0.177778 | 0.377778 | 2980 | 2980 | STV | - |
| 49995 | biochar_4936 | prodigal | NaN | 10051 | 11601 | + | biochar_4936_9 | Belongs to the glycosyl hydrolase 28 family | unsorted; | 2918.1388 | 0.111111 | 2.844444 | 6.659563 | 0.296296 | 0.333333 | 0.185185 | 0 | 0 | STV | eggNOG-mapper |
| 49996 | biochar_1323 | prodigal | NaN | 15408 | 17372 | - | biochar_1323_33 | phage tail tape measure protein | assembly;infection; | 2903.2569 | 0.125000 | 34.941667 | 8.791763 | 0.333333 | 0.125000 | 0.250000 | 0 | 0 | STV | eggNOG-mapper |
| 49997 | biochar_2347 | prodigal | NaN | 17625 | 17918 | + | biochar_2347_27 | unknown | unsorted; | 3046.4022 | 0.111111 | 48.225926 | 4.050028 | 0.444444 | 0.222222 | 0.296296 | 2980 | 2980 | STV | - |
| 49998 | biochar_2839 | prodigal | NaN | 4787 | 6559 | - | biochar_2839_5 | Required for morphogenesis and for the elongation of the flagellar filament by facilitating polymerization of the flagellin monomers at the tip of growing filament. Forms a capping structure, which prevents flagellin subunits (transported through the central channel of the flagellum) from leaking out without polymerization at the distal end | unsorted; | 3199.5959 | 0.033333 | 34.963333 | 6.182798 | 0.233333 | 0.166667 | 0.366667 | 0 | 0 | STV | eggNOG-mapper |
| 49999 | biochar_1064 | prodigal | NaN | 32415 | 32678 | - | biochar_1064_29 | unknown | unsorted; | 1994.2079 | 0.117647 | 83.764706 | 6.326430 | 0.235294 | 0.117647 | 0.411765 | 5500 | 5500 | STV | - |